Numerical Analysis Project Presentation

SALAKO ABDULHAQ ADETUNJI 201924080119
EJIYI CHUKWUEBUKA JOSEPH 201924090104

Regression For Data Analysis

Building Insurance Prediction With Regression Analysis

Outline

Introduction

Regression is the measures of the average relationship between two or more variables in terms of the original units of the data.

Regression analysis is a form of predictive modelling technique which investigates the relationship between a dependent (target) and independent variable (s) (predictor). 

If you recall Linear Regression, it is used to determine the value of a continuous dependent variable. Logistic Regression is generally used for classification purposes. Unlike Linear Regression, the dependent variable can take a limited number of values only i.e, the dependent variable is categorical. When the number of possible outcomes is only two it is called Binary Logistic Regression.

Importance of Regression Analysis

Provides estimate of values of dependent variables from values of independent variables

Can be extended to two or more variables, which is known as multiple regression

Shows the nature of relation between two or more variables

Let’s look at how logistic regression can be used for classification tasks.

In Linear Regression, the output is the weighted sum of inputs. Logistic Regression is a generalized Linear Regression in the sense that we don’t output the weighted sum of inputs directly, but we pass it through a function that can map any real value between 0 and 1.

We can see from the below figure that the output of the linear regression is passed through an activation function that can map any real value between 0 and 1.

What is Insurance Prediction and Why Does it Matter?

Recently, there has been an increase in the number of building collapse in Lagos and major cities in Nigeria. Olusola Insurance Company offers a building insurance policy that protects buildings against damages that could be caused by a fire or vandalism, by a flood or storm.

We have been appointed as the Lead Data Analysts to build a predictive model to determine if a building will have an insurance claim during a certain period or not. You will have to predict the probability of having at least one claim over the insured period of the building.

The model will be based on the building characteristics. The target variable, Claim, is a:

if the building has at least a claim over the insured period.

if the building doesn’t have a claim over the insured period.

Exploratory Data Analysis(EDA)

We use EDA to discover underlying patterns, spot anomalies, frame the hypothesis and check assumptions with the aim to find a good fitting model (if one exists).

Let's get started with some graphical visualisations of the data. We first import the necessary libraries and then use some tools of this libraries for visualization of the data.

Data Preprocessing and Feature Selection

Data Preprocessing is that step in which the data gets transformed, or Encoded, to bring it to such a state that now the machine can easily parse it. In other words, the features of the data can now be easily interpreted by the algorithm.

Feature encoding is basically performing transformations on the data such that it can be easily accepted as input for machine learning algorithms while still retaining its original meaning.

Missing values is very much usual to have missing values in our dataset, regardless missing values must be taken into consideration

Eliminate rows with missing data

Estimate missing values

Feature Selection

Feature Selection is the process where you automatically or manually select those features which contribute most to your prediction variable or output in which you are interested in

Univariate selection method

Correlation Matrix with Heatmap

Model Training & Evaluation

Final Test

Conclusion

To conclude, You now know what regression is and how you can implement it for classification with Python

We also explore the rule and intuition behind logistic regression to better explain the mathematical relationship between python developed model

Finally, we discuss the performance of our simple regression analysis compared to other basic machine learning techniques built on the concept of regression.